Squeezeformer Initial Commit

Initial Commit Co-authored-by: Albert Shaw <ashaw596@gmail.com> Co-authored-by: Nicholas Lee <caldragon18456@berkeley.edu> Co-authored-by: ani <aninrusimha@berkeley.edu> Co-authored-by: dragon18456 <nicholas_lee@berkeley.edu>

Squeezeformer Initial Commit
Initial Commit Co-authored-by: Albert Shaw <ashaw596@gmail.com> Co-authored-by: Nicholas Lee <caldragon18456@berkeley.edu> Co-authored-by: ani <aninrusimha@berkeley.edu> Co-authored-by: dragon18456 <nicholas_lee@berkeley.edu>
18dbf036 · Sehoon Kim · GitHub · 5d6f1ae4 · 18dbf036 · 18dbf036
Unverified Commit 18dbf036 authored May 29, 2022 by Sehoon Kim Committed by GitHub May 29, 2022
20 changed files
--- a/.gitignore
+++ b/.gitignore
+build
+dist
+*.egg-info
+tensorflow
+externals
+.vim
+.session.vim
+Session.vim
+.idea
+.vscode
+__pycache__
+.pytest*
+venv
+my_train
+*.sw*
--- a/LICENSE
+++ b/LICENSE
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/README.md
+++ b/README.md
+# Squeezeformer:  An Efficient Transformer for Automatic Speech Recognition
+![Screenshot from 2022-05-22 15-23-07](https://user-images.githubusercontent.com/50283958/169718508-fa7fd22f-9038-44f8-9e8e-bc9c6afff124.png)
+
+We provide testing codes for Squeezeformer, along with the pre-trained checkpoints.
+
+## Install Squeezeformer
+
+We recommend using Python version 3.8.  
+
+### 1. Install dependancies
+
+We support Tensorflow version of 2.5. Run the following commands depending on your target device type.
+
+* Running on CPUs: `pip install -e '.[tf2.5]'`
+* Running on GPUs: `pip install -e '.[tf2.5-gpu]'`
+
+### 2. Install CTC decoder 
+```bash
+cd scripts
+bash install_ctc_decoders.sh
+```
+
+## Prepare Dataset
+
+### 1. Download Librispeech
+
+[Librispeech](https://ieeexplore.ieee.org/document/7178964) is a widely-used ASR benchmark that consists of 960hr speech corpus with text transcriptions.
+The dataset consists of 3 training sets (`train-clean-100`, `train-clean-360`, `train-other-500`), 
+2 development sets (`dev-clean`, `dev-other`), and 2 test sets (`test-clean`, `test-other`).
+
+Download the datasets from this [link](http://www.openslr.org/12) and untar them.
+If this is for testing purposes only, you can skip the training datasets to save disk space.
+You should have flac files under `{dataset_path}/LibriSpeech`.
+
+### 2. Create Manifest Files
+
+Once you download the datasets, you should create a manifest file that links the file path to the audio input and its transcription.
+We use a script from [TensorFlowASR](https://github.com/TensorSpeech/TensorFlowASR).
+
+```bash
+cd scripts
+python create_librispeech_trans_all.py --data {dataset_path}/LibriSpeech --output {tsv_dir}
+```
+* The `dataset_path` is the directory that you untarred the datasets in the previous step.
+* This script creates tsv files under `tsv_dir` that list the audio file path, duration, and the transcription.
+* To skip processing the training datasets, use an additional argument `--mode test-only`.
+
+If you have followed the instruction correctly, you should have the following files under `tsv_dir`.
+* `dev_clean.tsv`, `dev_other.tsv`, `test_clean.tsv`, `test_other.tsv`
+* `train_clean_100.tsv`, `train_clean_360.tsv`, `train_other_500.tsv` (if not `--mode test-only`)
+* `train_other.tsv` that merges all training tsv files into one (if not `--mode test-only`)
+
+
+## Testing Squeezeformer
+
+### 1. Download Pre-trained Checkpoints
+
+We provide pre-trained checkpoints for all variants of Squeezeformer.
+
+|      **Model**      |                                                  **Checkpoint**                            | **test-clean** | **test-other** |
+| :-----------------: | :---------------------------------------------------------------------------------------:  | :------------: | :------------: |
+|  Squeezeformer-XS   | [link](https://drive.google.com/file/d/1qSukKHz2ltBiWU-xHGmI-P9ziPJcLcSu/view?usp=sharing) |    3.74        |      9.09      |
+|  Squeezeformer-S    | [link](https://drive.google.com/file/d/1PGao0AOe5aQXc-9eh2RDQZnZ4UcefcHB/view?usp=sharing) |    3.08        |      7.47      |
+|  Squeezeformer-SM   | [link](https://drive.google.com/file/d/17cL1p0KJgT-EBu_-bg3bF7-Uh-pnf-8k/view?usp=sharing) |    2.79        |      6.89      |
+|  Squeezeformer-M    | [link](https://drive.google.com/file/d/1fbaby-nOxHAGH0GqLoA0DIjFDPaOBl1d/view?usp=sharing) |    2.56        |      6.50      |
+|  Squeezeformer-ML   | [link](https://drive.google.com/file/d/1-ZPtJjJUHrcbhPp03KioadenBtKpp-km/view?usp=sharing) |    2.61        |      6.05      |
+|  Squeezeformer-L    | [link](https://drive.google.com/file/d/1LJua7A4ZMoZFi2cirf9AnYEl51pmC-m5/view?usp=sharing) |    2.47        |      5.97      |
+
+
+### 2. Run Inference!
+
+Run the following commands:
+```bash
+cd examples/squeezeformer
+python test.py --bs {batch_size} --config configs/squeezeformer-S.yml --saved squeezeformer-S.h5 \
+    --dataset_path {tsv_dir} --dataset {dev_clean|dev_other|test_clean|test_other}
+```
+
+* `tsv_dir` is the directory path to the tsv manifest files that you created in the previous step.
+* You can test on other Squeezeformer models by changing `--config` and `--saved`, e.g., Squeezeformer-L or Squeezeformer-M.
--- a/examples/squeezeformer/configs/conformer-L.yml
+++ b/examples/squeezeformer/configs/conformer-L.yml
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  num_feature_bins: 80
+  feature_type: log_mel_spectrogram
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_frame: False
+
+decoder_config:
+  vocabulary: ../../sp.model
+
+model_config:
+  encoder_subsampling:
+    type: conv2d
+    filters: 512
+    kernel_size: 3
+    strides: 2
+  encoder_dmodel: 512
+  encoder_num_blocks: 18
+  encoder_head_size: 64
+  encoder_num_heads: 8
+  encoder_mha_type: relmha
+  encoder_kernel_size: 31
+  encoder_fc_factor: 0.5
+  encoder_dropout: 0.1
+
+  # time reduction
+  encoder_time_reduce_idx: null
+  encoder_time_recover_idx: null
+
+  encoder_conv_use_glu: true
+  encoder_ds_subsample: false
+  encoder_no_post_ln: false
+  encoder_adaptive_scale: false
+  encoder_fixed_arch:
+    - f
+    - m
+    - c
+    - f
+
+learning_config:
+  train_dataset_config:
+    augmentation_config:
+      time_masking:
+        num_masks: 10
+        p_upperbound: 0.05
+      freq_masking:
+        num_masks: 2
+        mask_factor: 27
+    data_paths: null
+    shuffle: True
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: train
+
+  eval_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: eval
+
+  test_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: False
+    stage: test
+
+  optimizer_config:
+    beta_1: 0.9
+    beta_2: 0.98
+    epsilon: 1e-9
+
+  running_config:
+    num_epochs: 1000
+    filepath: null
+    checkpoint:
+      filepath: null
+      save_best_only: False
+      save_weights_only: True
+      save_freq: epoch
+    states_dir: null
--- a/examples/squeezeformer/configs/conformer-M.yml
+++ b/examples/squeezeformer/configs/conformer-M.yml
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  num_feature_bins: 80
+  feature_type: log_mel_spectrogram
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_frame: False
+
+decoder_config:
+  vocabulary: ../../sp.model
+
+model_config:
+  encoder_subsampling:
+    type: conv2d
+    filters: 256
+    kernel_size: 3
+    strides: 2
+  encoder_dmodel: 256
+  encoder_num_blocks: 16
+  encoder_head_size: 64
+  encoder_num_heads: 4
+  encoder_mha_type: relmha
+  encoder_kernel_size: 31
+  encoder_fc_factor: 0.5
+  encoder_dropout: 0.1
+
+  # time reduction
+  encoder_time_reduce_idx: null
+  encoder_time_recover_idx: null
+
+  encoder_conv_use_glu: true
+  encoder_ds_subsample: false
+  encoder_no_post_ln: false
+  encoder_adaptive_scale: false
+  encoder_fixed_arch:
+    - f
+    - m
+    - c
+    - f
+
+learning_config:
+  train_dataset_config:
+    augmentation_config:
+      time_masking:
+        num_masks: 5
+        p_upperbound: 0.05
+      freq_masking:
+        num_masks: 2
+        mask_factor: 27
+    data_paths: null
+    shuffle: True
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: train
+
+  eval_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: eval
+
+  test_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: False
+    stage: test
+
+  optimizer_config:
+    beta_1: 0.9
+    beta_2: 0.98
+    epsilon: 1e-9
+
+  running_config:
+    num_epochs: 1000
+    filepath: null
+    checkpoint:
+      filepath: null
+      save_best_only: False
+      save_weights_only: True
+      save_freq: epoch
+    states_dir: null
--- a/examples/squeezeformer/configs/conformer-S.yml
+++ b/examples/squeezeformer/configs/conformer-S.yml
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  num_feature_bins: 80
+  feature_type: log_mel_spectrogram
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_frame: False
+
+decoder_config:
+  vocabulary: ../../sp.model
+
+model_config:
+  encoder_subsampling:
+    type: conv2d
+    filters: 144
+    kernel_size: 3
+    strides: 2
+  encoder_dmodel: 144
+  encoder_num_blocks: 16
+  encoder_head_size: 36
+  encoder_num_heads: 4
+  encoder_mha_type: relmha
+  encoder_kernel_size: 31
+  encoder_fc_factor: 0.5
+  encoder_dropout: 0.1
+
+  # time reduction
+  encoder_time_reduce_idx: null
+  encoder_time_recover_idx: null
+
+  encoder_conv_use_glu: true
+  encoder_ds_subsample: false
+  encoder_no_post_ln: false
+  encoder_adaptive_scale: false
+  encoder_fixed_arch:
+    - f
+    - m
+    - c
+    - f
+
+learning_config:
+  train_dataset_config:
+    augmentation_config:
+      time_masking:
+        num_masks: 5
+        p_upperbound: 0.05
+      freq_masking:
+        num_masks: 2
+        mask_factor: 27
+    data_paths: null
+    shuffle: True
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: train
+
+  eval_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: eval
+
+  test_dataset_config:
+    data_paths:  null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: False
+    stage: test
+
+  optimizer_config:
+    beta_1: 0.9
+    beta_2: 0.98
+    epsilon: 1e-9
+
+  running_config:
+    num_epochs: 1000
+    filepath: null
+    checkpoint:
+      filepath: null
+      save_best_only: False
+      save_weights_only: True
+      save_freq: epoch
+    states_dir: null
--- a/examples/squeezeformer/configs/squeezeformer-L.yml
+++ b/examples/squeezeformer/configs/squeezeformer-L.yml
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  num_feature_bins: 80
+  feature_type: log_mel_spectrogram
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_frame: False
+
+decoder_config:
+  vocabulary: ../../sp.model
+
+model_config:
+  encoder_subsampling:
+    type: conv2d
+    filters: 640
+    kernel_size: 3
+    strides: 2
+  encoder_dmodel: 640
+  encoder_num_blocks: 22
+  encoder_head_size: 80
+  encoder_num_heads: 8
+  encoder_mha_type: relmha
+  encoder_kernel_size: 31
+  encoder_fc_factor: 1.
+  encoder_dropout: 0.1
+
+  encoder_time_reduce_idx: 
+    - 10
+  encoder_time_recover_idx: 
+    - 21
+
+  encoder_conv_use_glu: false
+  encoder_ds_subsample: true
+  encoder_no_post_ln: true
+  encoder_adaptive_scale: true
+  encoder_fixed_arch:
+    - M
+    - s
+    - C
+    - s
+
+learning_config:
+  train_dataset_config:
+    augmentation_config:
+      time_masking:
+        num_masks: 10
+        p_upperbound: 0.05
+      freq_masking:
+        num_masks: 2
+        mask_factor: 27
+    data_paths: null
+    shuffle: True
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: train
+
+  eval_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: eval
+
+  test_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: False
+    stage: test
+
+  optimizer_config:
+    beta_1: 0.9
+    beta_2: 0.98
+    epsilon: 1e-9
+
+  running_config:
+    num_epochs: 1000
+    filepath: null
+    checkpoint:
+      filepath: null
+      save_best_only: False
+      save_weights_only: True
+      save_freq: epoch
+    states_dir: null
--- a/examples/squeezeformer/configs/squeezeformer-M.yml
+++ b/examples/squeezeformer/configs/squeezeformer-M.yml
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  num_feature_bins: 80
+  feature_type: log_mel_spectrogram
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_frame: False
+
+decoder_config:
+  vocabulary: ../../sp.model
+
+model_config:
+  encoder_subsampling:
+    type: conv2d
+    filters: 324
+    kernel_size: 3
+    strides: 2
+  encoder_dmodel: 324
+  encoder_num_blocks: 20
+  encoder_head_size: 81
+  encoder_num_heads: 4
+  encoder_mha_type: relmha
+  encoder_kernel_size: 31
+  encoder_fc_factor: 1.
+  encoder_dropout: 0.1
+
+  encoder_time_reduce_idx: 
+    - 9
+  encoder_time_recover_idx: 
+    - 19
+
+  encoder_conv_use_glu: false
+  encoder_ds_subsample: true
+  encoder_no_post_ln: true
+  encoder_adaptive_scale: true
+  encoder_fixed_arch:
+    - M
+    - s
+    - C
+    - s
+
+learning_config:
+  train_dataset_config:
+    augmentation_config:
+      time_masking:
+        num_masks: 7
+        p_upperbound: 0.05
+      freq_masking:
+        num_masks: 2
+        mask_factor: 27
+    data_paths: null
+    shuffle: True
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: train
+
+  eval_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: eval
+
+  test_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: False
+    stage: test
+
+  optimizer_config:
+    beta_1: 0.9
+    beta_2: 0.98
+    epsilon: 1e-9
+
+  running_config:
+    num_epochs: 1000
+    filepath: null
+    checkpoint:
+      filepath: null
+      save_best_only: False
+      save_weights_only: True
+      save_freq: epoch
+    states_dir: null
--- a/examples/squeezeformer/configs/squeezeformer-ML.yml
+++ b/examples/squeezeformer/configs/squeezeformer-ML.yml
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  num_feature_bins: 80
+  feature_type: log_mel_spectrogram
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_frame: False
+
+decoder_config:
+  vocabulary: ../../sp.model
+
+model_config:
+  encoder_subsampling:
+    type: conv2d
+    filters: 512
+    kernel_size: 3
+    strides: 2
+  encoder_dmodel: 512
+  encoder_num_blocks: 18
+  encoder_head_size: 64
+  encoder_num_heads: 8
+  encoder_mha_type: relmha
+  encoder_kernel_size: 31
+  encoder_fc_factor: 1.
+  encoder_dropout: 0.1
+
+  encoder_time_reduce_idx: 
+    - 8
+  encoder_time_recover_idx: 
+    - 17
+
+  encoder_conv_use_glu: false
+  encoder_ds_subsample: true
+  encoder_no_post_ln: true
+  encoder_adaptive_scale: true
+  encoder_fixed_arch:
+    - M
+    - s
+    - C
+    - s
+
+learning_config:
+  train_dataset_config:
+    augmentation_config:
+      time_masking:
+        num_masks: 10
+        p_upperbound: 0.05
+      freq_masking:
+        num_masks: 2
+        mask_factor: 27
+    data_paths: null
+    shuffle: True
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: train
+
+  eval_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: eval
+
+  test_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: False
+    stage: test
+
+  optimizer_config:
+    beta_1: 0.9
+    beta_2: 0.98
+    epsilon: 1e-9
+
+  running_config:
+    num_epochs: 1000
+    filepath: null
+    checkpoint:
+      filepath: null
+      save_best_only: False
+      save_weights_only: True
+      save_freq: epoch
+    states_dir: null
--- a/examples/squeezeformer/configs/squeezeformer-S.yml
+++ b/examples/squeezeformer/configs/squeezeformer-S.yml
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  num_feature_bins: 80
+  feature_type: log_mel_spectrogram
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_frame: False
+
+decoder_config:
+  vocabulary: ../../sp.model
+
+model_config:
+  encoder_subsampling:
+    type: conv2d
+    filters: 196
+    kernel_size: 3
+    strides: 2
+  encoder_dmodel: 196
+  encoder_num_blocks: 18
+  encoder_head_size: 49
+  encoder_num_heads: 4
+  encoder_mha_type: relmha
+  encoder_kernel_size: 31
+  encoder_fc_factor: 1.
+  encoder_dropout: 0.1
+
+  # time reduction
+  encoder_time_reduce_idx: 
+    - 8
+  encoder_time_recover_idx: 
+    - 17
+
+  encoder_conv_use_glu: false
+  encoder_ds_subsample: true
+  encoder_no_post_ln: true
+  encoder_adaptive_scale: true
+  encoder_fixed_arch:
+    - M
+    - s
+    - C
+    - s
+
+learning_config:
+  train_dataset_config:
+    augmentation_config:
+      time_masking:
+        num_masks: 5
+        p_upperbound: 0.05
+      freq_masking:
+        num_masks: 2
+        mask_factor: 27
+    data_paths: null
+    shuffle: True
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: train
+
+  eval_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: eval
+
+  test_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: False
+    stage: test
+
+  optimizer_config:
+    beta_1: 0.9
+    beta_2: 0.98
+    epsilon: 1e-9
+
+  running_config:
+    num_epochs: 1000
+    filepath: null
+    checkpoint:
+      filepath: null
+      save_best_only: False
+      save_weights_only: True
+      save_freq: epoch
+    states_dir: null
--- a/examples/squeezeformer/configs/squeezeformer-SM.yml
+++ b/examples/squeezeformer/configs/squeezeformer-SM.yml
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  num_feature_bins: 80
+  feature_type: log_mel_spectrogram
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_frame: False
+
+decoder_config:
+  vocabulary: ../../sp.model
+
+model_config:
+  encoder_subsampling:
+    type: conv2d
+    filters: 256
+    kernel_size: 3
+    strides: 2
+  encoder_dmodel: 256
+  encoder_num_blocks: 16
+  encoder_head_size: 64
+  encoder_num_heads: 4
+  encoder_mha_type: relmha
+  encoder_kernel_size: 31
+  encoder_fc_factor: 1.
+  encoder_dropout: 0.1
+
+  encoder_time_reduce_idx: 
+    - 7
+  encoder_time_recover_idx: 
+    - 15
+
+  encoder_conv_use_glu: false
+  encoder_ds_subsample: true
+  encoder_no_post_ln: true
+  encoder_adaptive_scale: true
+  encoder_fixed_arch:
+    - M
+    - s
+    - C
+    - s
+
+learning_config:
+  train_dataset_config:
+    augmentation_config:
+      time_masking:
+        num_masks: 5
+        p_upperbound: 0.05
+      freq_masking:
+        num_masks: 2
+        mask_factor: 27
+    data_paths: null
+    shuffle: True
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: train
+
+  eval_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: eval
+
+  test_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: False
+    stage: test
+
+  optimizer_config:
+    beta_1: 0.9
+    beta_2: 0.98
+    epsilon: 1e-9
+
+  running_config:
+    num_epochs: 1000
+    filepath: null
+    checkpoint:
+      filepath: null
+      save_best_only: False
+      save_weights_only: True
+      save_freq: epoch
+    states_dir: null
--- a/examples/squeezeformer/configs/squeezeformer-XS.yml
+++ b/examples/squeezeformer/configs/squeezeformer-XS.yml
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  num_feature_bins: 80
+  feature_type: log_mel_spectrogram
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_frame: False
+
+decoder_config:
+  vocabulary: ../../sp.model
+
+model_config:
+  encoder_subsampling:
+    type: conv2d
+    filters: 144
+    kernel_size: 3
+    strides: 2
+  encoder_dmodel: 144
+  encoder_num_blocks: 16
+  encoder_head_size: 36
+  encoder_num_heads: 4
+  encoder_mha_type: relmha
+  encoder_kernel_size: 31
+  encoder_fc_factor: 1.
+  encoder_dropout: 0.1
+
+  # time reduction
+  encoder_time_reduce_idx: 
+    - 7
+  encoder_time_recover_idx: 
+    - 15
+
+  encoder_conv_use_glu: false
+  encoder_ds_subsample: true
+  encoder_no_post_ln: true
+  encoder_adaptive_scale: true
+  encoder_fixed_arch:
+    - M
+    - s
+    - C
+    - s
+
+learning_config:
+  train_dataset_config:
+    augmentation_config:
+      time_masking:
+        num_masks: 5
+        p_upperbound: 0.05
+      freq_masking:
+        num_masks: 2
+        mask_factor: 27
+    data_paths: null
+    shuffle: True
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: train
+
+  eval_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: True
+    stage: eval
+
+  test_dataset_config:
+    data_paths: null
+    shuffle: False
+    cache: True
+    buffer_size: 100
+    drop_remainder: False
+    stage: test
+
+  optimizer_config:
+    beta_1: 0.9
+    beta_2: 0.98
+    epsilon: 1e-9
+
+  running_config:
+    num_epochs: 1000
+    filepath: null
+    checkpoint:
+      filepath: null
+      save_best_only: False
+      save_weights_only: True
+      save_freq: epoch
+    states_dir: null
--- a/examples/squeezeformer/test.py
+++ b/examples/squeezeformer/test.py
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from tqdm import tqdm
+import argparse
+from scipy.special import softmax
+import datasets
+
+import tensorflow as tf
+
+from src.configs.config import Config
+from src.datasets.asr_dataset import ASRSliceDataset
+from src.featurizers.speech_featurizers import TFSpeechFeaturizer
+from src.featurizers.text_featurizers import SentencePieceFeaturizer
+from src.models.conformer import ConformerCtc
+from src.utils import env_util, file_util
+
+logger = env_util.setup_environment()
+
+physical_devices = tf.config.list_physical_devices('GPU') 
+tf.config.experimental.set_memory_growth(physical_devices[0], True)
+
+DEFAULT_YAML = os.path.join(os.path.abspath(os.path.dirname(__file__)), "config.yml")
+
+tf.keras.backend.clear_session()
+
+def parse_arguments():
+    parser = argparse.ArgumentParser(prog="Conformer Testing")
+
+    parser.add_argument("--config", type=str, default=DEFAULT_YAML, help="The file path of model configuration file")
+    parser.add_argument("--mxp", default=False, action="store_true", help="Enable mixed precision")
+    parser.add_argument("--device", type=int, default=0, help="Device's id to run test on")
+    parser.add_argument("--cpu", default=False, action="store_true", help="Whether to only use cpu")
+    parser.add_argument("--saved", type=str, default=None, help="Path to saved model")
+    parser.add_argument("--output", type=str, default=None, help="Result filepath")
+
+    # Dataset arguments
+    parser.add_argument("--bs", type=int, default=None, help="Test batch size")
+    parser.add_argument("--dataset_path", type=str, required=True, help="path to the tsv manifest files")
+    parser.add_argument("--dataset", type=str, default="test_other", 
+                        choices=["dev_clean", "dev_other", "test_clean", "test_other"], help="Testing dataset")
+    parser.add_argument("--input_padding", type=int, default=3700)
+    parser.add_argument("--label_padding", type=int, default=530)
+
+    # Architecture arguments
+    parser.add_argument("--fixed_arch", default=None, help="force fixed architecture")
+
+    # Decoding arguments
+    parser.add_argument("--beam_size", type=int, default=None, help="ctc beam size")
+
+    args = parser.parse_args()
+    return args
+
+
+def parse_fixed_arch(args):
+    parsed_arch = args.fixed_arch.split('|')
+    i, rep = 0, 1
+    fixed_arch = []
+    while i < len(parsed_arch):
+        if parsed_arch[i].isnumeric():
+            rep = int(parsed_arch[i])
+        else:
+            block = parsed_arch[i].split(',')
+            assert len(block) == NUM_LAYERS_IN_BLOCK
+            for _ in range(rep):
+                fixed_arch.append(block)
+            rep = 1
+        i += 1
+    return fixed_arch
+
+args = parse_arguments()
+
+config = Config(args.config)
+
+NUM_BLOCKS = config.model_config['encoder_num_blocks']
+NUM_LAYERS_IN_BLOCK = 4
+
+tf.config.optimizer.set_experimental_options({"auto_mixed_precision": args.mxp})
+env_util.setup_devices([args.device], cpu=args.cpu)
+
+speech_featurizer = TFSpeechFeaturizer(config.speech_config)
+
+logger.info("Use SentencePiece ...")
+text_featurizer = SentencePieceFeaturizer(config.decoder_config)
+
+tf.random.set_seed(0)
+
+# Parse fixed architecture
+if args.fixed_arch is not None:
+    fixed_arch = parse_fixed_arch(args)
+    if len(fixed_arch) != NUM_BLOCKS:
+        logger.warn(
+            f"encoder_num_blocks={config.model_config['encoder_num_blocks']} is " \
+            f"different from len(fixed_arch) = {len(fixed_arch)}." \
+        )
+        logger.warn(f"Changing `encoder_num_blocks` to {len(fixed_arch)}")
+        config.model_config['encoder_num_blocks'] = len(fixed_arch)
+    logger.info(f"Changing fixed arch: {fixed_arch}")
+    config.model_config['encoder_fixed_arch'] = fixed_arch
+
+if args.dataset_path is not None:
+    dataset_path = os.path.join(args.dataset_path, f"{args.dataset}.tsv")
+    logger.info(f"dataset: {args.dataset} at {dataset_path}")
+    config.learning_config.test_dataset_config.data_paths = [dataset_path]
+else:
+    raise ValueError("specify the manifest file path using --dataset_path")
+
+test_dataset = ASRSliceDataset(
+    speech_featurizer=speech_featurizer,
+    text_featurizer=text_featurizer,
+    input_padding_length=args.input_padding,
+    label_padding_length=args.label_padding,
+    **vars(config.learning_config.test_dataset_config)
+)
+
+conformer = ConformerCtc(
+    **config.model_config, 
+    vocabulary_size=text_featurizer.num_classes, 
+)
+
+conformer.make(speech_featurizer.shape)
+
+if args.saved:
+    conformer.load_weights(args.saved, by_name=True)
+else:
+    logger.warning("Model is initialized randomly, please use --saved to assign checkpoint")
+conformer.summary(line_length=100)
+conformer.add_featurizers(speech_featurizer, text_featurizer)
+
+batch_size = args.bs or config.learning_config.running_config.batch_size
+test_data_loader = test_dataset.create(batch_size)
+
+blank_id = text_featurizer.blank
+
+true_decoded = []
+pred_decoded = []
+beam_decoded = []
+
+#for batch in enumerate(test_data_loader):
+for k, batch in tqdm(enumerate(test_data_loader)):
+    labels, labels_len = batch[1]['labels'], batch[1]['labels_length']
+
+    outputs = conformer(batch[0], training=False)
+    logits, logits_len = outputs['logits'], outputs['logits_length']
+    probs = softmax(logits)
+
+    if args.beam_size is not None:
+        beam = tf.nn.ctc_beam_search_decoder(
+            tf.transpose(logits, perm=[1, 0, 2]), logits_len, beam_width=args.beam_size, top_paths=1,
+        )
+        beam = tf.sparse.to_dense(beam[0][0]).numpy()
+
+    for i, (p, l, label, ll) in enumerate(zip(probs, logits_len, labels, labels_len)):
+        # p: length x characters
+        pred = p[:l].argmax(-1)
+        decoded_prediction = []
+        previous = blank_id
+
+        # remove the repeting characters and the blanck characters
+        for p in pred:
+            if (p != previous or previous == blank_id) and p != blank_id:
+                decoded_prediction.append(p)
+            previous = p
+
+        if len(decoded_prediction) == 0:
+            decoded = ""
+        else:
+            decoded = text_featurizer.iextract([decoded_prediction]).numpy()[0].decode('utf-8')
+        pred_decoded.append(decoded)
+        label_len = tf.math.reduce_sum(tf.cast(label != 0, tf.int32))
+        true_decoded.append(text_featurizer.iextract([label[:label_len]]).numpy()[0].decode('utf-8'))
+
+        if args.beam_size is not None:
+            b = beam[i]
+            previous = blank_id
+
+            # remove the repeting characters and the blanck characters
+            beam_prediction = []
+            for p in b:
+                if (p != previous or previous == blank_id) and p != blank_id:
+                    beam_prediction.append(p)
+                previous = p
+
+            if len(beam_prediction) == 0:
+                decoded = ""
+            else:
+                decoded = text_featurizer.iextract([beam_prediction]).numpy()[0].decode('utf-8')
+            beam_decoded.append(decoded)
+
+wer_metric = datasets.load_metric("wer")
+logger.info(f"Length decoded: {len(true_decoded)}")
+logger.info(f"WER: {wer_metric.compute(predictions=pred_decoded, references=true_decoded)}")
+
+if args.beam_size is not None:
+    logger.info(f"WER-beam: {wer_metric.compute(predictions=beam_decoded, references=true_decoded)}")
+
+
+if args.output is not None:
+    with file_util.save_file(file_util.preprocess_paths(args.output)) as filepath:
+        overwrite = True
+        if tf.io.gfile.exists(filepath):
+            overwrite = input(f"Overwrite existing result file {filepath} ? (y/n): ").lower() == "y"
+        if overwrite:
+            logger.info(f"Saving result to {args.output} ...")
+            with open(filepath, "w") as openfile:
+                openfile.write("PATH\tDURATION\tGROUNDTRUTH\tGREEDY\tBEAMSEARCH\n")
+                progbar = tqdm(total=test_dataset.total_steps, unit="batch")
+                for i, (groundtruth, greedy) in enumerate(zip(true_decoded, pred_decoded)):
+                    openfile.write(f"N/A\tN/A\t{groundtruth}\t{greedy}\tN/A\n")
+                    progbar.update(1)
+                progbar.close()
--- a/requirements.txt
+++ b/requirements.txt
+cython
+numpy
+scipy
+sklearn
+pandas
+tensorflow-datasets>=4.2.0
+tensorflow-addons>=0.11.1
+setuptools>=47.1.1
+librosa>=0.8.0
+soundfile>=0.10.3
+PyYAML>=5.3.1
+matplotlib>=3.2.1
+sox>=1.4.1
+tqdm>=4.54.1
+colorama>=0.4.4
+nlpaug>=1.1.1
+nltk>=3.5
+sentencepiece>=0.1.94
+wandb
+tensorflow_probability
+tensorflow_io==0.18
+google-cloud-storage
+cloud-tpu-client
+datasets
+jiwer
--- a/scripts/create_librispeech_trans.py
+++ b/scripts/create_librispeech_trans.py
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import glob
+import argparse
+import librosa
+from tqdm.auto import tqdm
+import unicodedata
+
+from src.utils.file_util import preprocess_paths
+
+parser = argparse.ArgumentParser(prog="Setup LibriSpeech Transcripts")
+
+parser.add_argument("--dir", "-d", type=str, default=None, help="Directory of dataset")
+
+parser.add_argument("output", type=str, default=None, help="The output .tsv transcript file path")
+
+args = parser.parse_args()
+
+assert args.dir and args.output
+
+args.dir = preprocess_paths(args.dir, isdir=True)
+args.output = preprocess_paths(args.output)
+
+transcripts = []
+
+text_files = glob.glob(os.path.join(args.dir, "**", "*.txt"), recursive=True)
+
+for text_file in tqdm(text_files, desc="[Loading]"):
+    current_dir = os.path.dirname(text_file)
+    with open(text_file, "r", encoding="utf-8") as txt:
+        lines = txt.read().splitlines()
+    for line in lines:
+        line = line.split(" ", maxsplit=1)
+        audio_file = os.path.join(current_dir, line[0] + ".flac")
+        y, sr = librosa.load(audio_file, sr=None)
+        duration = librosa.get_duration(y, sr)
+        text = unicodedata.normalize("NFC", line[1].lower())
+        transcripts.append(f"{audio_file}\t{duration}\t{text}\n")
+
+with open(args.output, "w", encoding="utf-8") as out:
+    out.write("PATH\tDURATION\tTRANSCRIPT\n")
+    for line in tqdm(transcripts, desc="[Writing]"):
+        out.write(line)
--- a/scripts/create_librispeech_trans_all.py
+++ b/scripts/create_librispeech_trans_all.py
+import os
+import csv
+import subprocess
+import argparse
+
+def arg_parse():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--mode', type=str, default='all', choices=['all', 'test-only'])
+    parser.add_argument('--dataset_dir', type=str, required=True)
+    parser.add_argument('--output_dir', type=str, required=True)
+    args = parser.parse_args()
+    return args
+
+args = arg_parse()
+
+for n in ['dev', 'test']:
+    for m in ['clean', 'other']:
+        outname = f'{n}_{m}.tsv'
+        inname = f'{n}-{m}'
+        print(f'processing {inname}')
+        subprocess_args = [
+            'python', 'create_librispeech_trans.py', os.path.join(args.output_dir, outname),
+            '--dir', os.path.join(args.dataset_dir, inname)
+        ]
+
+        subprocess.call(subprocess_args)
+
+if args.mode == 'all':
+    train_set_names = [
+        ('train-clean-100', 'train_clean_100.tsv'), 
+        ('train-clean-360', 'train_clean_360.tsv'), 
+        ('train-other-500', 'train_other_500.tsv'), 
+    ]
+
+    for inname, outname in train_set_names:
+        print(f'processing {inname}')
+        subprocess_args = [
+            'python', 'create_librispeech_trans.py', os.path.join(args.output_dir, outname),
+            '--dir', os.path.join(args.dataset_dir, inname)
+        ]
+
+        subprocess.call(subprocess_args)
+
+    lines = ["PATH\tDURATION\tTRANSCRIPT\n"]
+
+    tsv_names = [x[-1] for x in train_set_names]
+    for tsv_name in tsv_names:
+        infile = os.path.join(args.output_dir, tsv_name)
+
+
+        with open(infile) as file:
+            tsv_file = csv.reader(file, delimiter="\t")
+            for i, line in enumerate(tsv_file):
+                if i == 0: continue
+                audio_file, duration, text = line
+                lines.append(f"{audio_file}\t{duration}\t{text}\n")
+
+    output_file = os.path.join(args.output_dir, 'train_all.tsv')
+    with open(output_file, "w", encoding="utf-8") as out:
+        for line in lines:
+            out.write(line)
--- a/scripts/install_ctc_decoders.sh
+++ b/scripts/install_ctc_decoders.sh
+#!/usr/bin/env bash
+
+mkdir externals
+cd ./externals || exit
+
+# Install baidu's beamsearch_with_lm
+if [ ! -d ctc_decoders ]; then
+    git clone https://github.com/usimarit/ctc_decoders.git
+
+    cd ./ctc_decoders || exit
+    chmod a+x setup.sh
+    chown "$USER":"$USER" setup.sh
+    ./setup.sh
+
+    cd ..
+fi
+
+cd ..
--- a/setup.py
+++ b/setup.py
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import setuptools
+
+with open("README.md", "r") as fh:
+    long_description = fh.read()
+
+with open("requirements.txt", "r") as fr:
+    requirements = fr.read().splitlines()
+
+setuptools.setup(
+    name="squeezeformer",
+    packages=setuptools.find_packages(include=["src*"]),
+    install_requires=requirements,
+    extras_require={
+        #"tf2.3": ["tensorflow>=2.3.0,<2.4", "tensorflow-text>2.3.0,<2.4", "tensorflow-io>=0.16.0,<0.17"],
+        #"tf2.3-gpu": ["tensorflow-gpu>=2.3.0,<2.4", "tensorflow-text>=2.3.0,<2.4", "tensorflow-io>=0.16.0,<0.17"],
+        #"tf2.4": ["tensorflow>=2.4.0,<2.5", "tensorflow-text>=2.4.0,<2.5", "tensorflow-io>=0.17.0,<0.18"],
+        #"tf2.4-gpu": ["tensorflow-gpu>=2.4.0,<2.5", "tensorflow-text>=2.4.0,<2.5", "tensorflow-io>=0.17.0,<0.18"],
+        "tf2.5": ["tensorflow>=2.5.0,<2.6", "tensorflow-text>=2.5.0,<2.6", "tensorflow-io>=0.18.0,<0.19"],
+        "tf2.5-gpu": ["tensorflow-gpu>=2.5.0,<2.6", "tensorflow-text>=2.5.0,<2.6", "tensorflow-io>=0.18.0,<0.19"]
+    },
+    classifiers=[
+        "Programming Language :: Python :: 3.6",
+        "Programming Language :: Python :: 3.7",
+        "Programming Language :: Python :: 3.8",
+        "Intended Audience :: Science/Research",
+        "Operating System :: POSIX :: Linux",
+        "License :: OSI Approved :: Apache Software License",
+        "Topic :: Software Development :: Libraries :: Python Modules"
+    ],
+    python_requires='>=3.6',
+)
--- a/sp.model
+++ b/sp.model
--- a/src/__init__.py
+++ b/src/__init__.py