README.md 4.16 KB
Newer Older
Sehoon Kim's avatar
Sehoon Kim committed
1
2
3
4
5
# Squeezeformer:  An Efficient Transformer for Automatic Speech Recognition
![Screenshot from 2022-05-22 15-23-07](https://user-images.githubusercontent.com/50283958/169718508-fa7fd22f-9038-44f8-9e8e-bc9c6afff124.png)

We provide testing codes for Squeezeformer, along with the pre-trained checkpoints.

Sehoon Kim's avatar
Sehoon Kim committed
6
7
Check out [paper](https://arxiv.org/pdf/2206.00888.pdf) for more details.

Sehoon Kim's avatar
Sehoon Kim committed
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
## Install Squeezeformer

We recommend using Python version 3.8.  

### 1. Install dependancies

We support Tensorflow version of 2.5. Run the following commands depending on your target device type.

* Running on CPUs: `pip install -e '.[tf2.5]'`
* Running on GPUs: `pip install -e '.[tf2.5-gpu]'`

### 2. Install CTC decoder 
```bash
cd scripts
bash install_ctc_decoders.sh
```

## Prepare Dataset

### 1. Download Librispeech

[Librispeech](https://ieeexplore.ieee.org/document/7178964) is a widely-used ASR benchmark that consists of 960hr speech corpus with text transcriptions.
The dataset consists of 3 training sets (`train-clean-100`, `train-clean-360`, `train-other-500`), 
2 development sets (`dev-clean`, `dev-other`), and 2 test sets (`test-clean`, `test-other`).

Download the datasets from this [link](http://www.openslr.org/12) and untar them.
If this is for testing purposes only, you can skip the training datasets to save disk space.
You should have flac files under `{dataset_path}/LibriSpeech`.

### 2. Create Manifest Files

Once you download the datasets, you should create a manifest file that links the file path to the audio input and its transcription.
We use a script from [TensorFlowASR](https://github.com/TensorSpeech/TensorFlowASR).

```bash
cd scripts
python create_librispeech_trans_all.py --data {dataset_path}/LibriSpeech --output {tsv_dir}
```
* The `dataset_path` is the directory that you untarred the datasets in the previous step.
* This script creates tsv files under `tsv_dir` that list the audio file path, duration, and the transcription.
* To skip processing the training datasets, use an additional argument `--mode test-only`.

If you have followed the instruction correctly, you should have the following files under `tsv_dir`.
* `dev_clean.tsv`, `dev_other.tsv`, `test_clean.tsv`, `test_other.tsv`
* `train_clean_100.tsv`, `train_clean_360.tsv`, `train_other_500.tsv` (if not `--mode test-only`)
* `train_other.tsv` that merges all training tsv files into one (if not `--mode test-only`)


## Testing Squeezeformer

### 1. Download Pre-trained Checkpoints

We provide pre-trained checkpoints for all variants of Squeezeformer.

|      **Model**      |                                                  **Checkpoint**                            | **test-clean** | **test-other** |
| :-----------------: | :---------------------------------------------------------------------------------------:  | :------------: | :------------: |
|  Squeezeformer-XS   | [link](https://drive.google.com/file/d/1qSukKHz2ltBiWU-xHGmI-P9ziPJcLcSu/view?usp=sharing) |    3.74        |      9.09      |
|  Squeezeformer-S    | [link](https://drive.google.com/file/d/1PGao0AOe5aQXc-9eh2RDQZnZ4UcefcHB/view?usp=sharing) |    3.08        |      7.47      |
|  Squeezeformer-SM   | [link](https://drive.google.com/file/d/17cL1p0KJgT-EBu_-bg3bF7-Uh-pnf-8k/view?usp=sharing) |    2.79        |      6.89      |
|  Squeezeformer-M    | [link](https://drive.google.com/file/d/1fbaby-nOxHAGH0GqLoA0DIjFDPaOBl1d/view?usp=sharing) |    2.56        |      6.50      |
|  Squeezeformer-ML   | [link](https://drive.google.com/file/d/1-ZPtJjJUHrcbhPp03KioadenBtKpp-km/view?usp=sharing) |    2.61        |      6.05      |
|  Squeezeformer-L    | [link](https://drive.google.com/file/d/1LJua7A4ZMoZFi2cirf9AnYEl51pmC-m5/view?usp=sharing) |    2.47        |      5.97      |


### 2. Run Inference!

Run the following commands:
```bash
cd examples/squeezeformer
python test.py --bs {batch_size} --config configs/squeezeformer-S.yml --saved squeezeformer-S.h5 \
    --dataset_path {tsv_dir} --dataset {dev_clean|dev_other|test_clean|test_other}
```

* `tsv_dir` is the directory path to the tsv manifest files that you created in the previous step.
* You can test on other Squeezeformer models by changing `--config` and `--saved`, e.g., Squeezeformer-L or Squeezeformer-M.