README.md 2.33 KB
Newer Older
Pingchuan Ma's avatar
Pingchuan Ma committed
1
2
3
<p align="center"><img width="160" src="doc/lip_white.png" alt="logo"></p>
<h1 align="center">RNN-T ASR/VSR/AV-ASR Examples</h1>

4
This repository contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes. We follow the same training pipeline as [AutoAVSR](https://arxiv.org/abs/2303.14307).
Pingchuan Ma's avatar
Pingchuan Ma committed
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

## Preparation
1. Setup the environment.
```
conda create -y -n autoavsr python=3.8
conda activate autoavsr
```

2. Install PyTorch nightly version (Pytorch, Torchvision, Torchaudio) from [source](https://pytorch.org/get-started/), along with all necessary packages:

```Shell
pip install pytorch-lightning sentencepiece
```

3. Preprocess LRS3 to a cropped-face dataset from the [data_prep](./data_prep) folder.

21
4. `[sp_model_path]` is a sentencepiece model to encode targets, which can be generated using `train_spm.py`.
Pingchuan Ma's avatar
Pingchuan Ma committed
22

23
### Training ASR or VSR model
Pingchuan Ma's avatar
Pingchuan Ma committed
24

25
- `[root_dir]` is the root directory for the LRS3 cropped-face dataset.
Pingchuan Ma's avatar
Pingchuan Ma committed
26
27
28
- `[modality]` is the input modality type, including `v`, `a`, and `av`.
- `[mode]` is the model type, including `online` and `offline`.

29

Pingchuan Ma's avatar
Pingchuan Ma committed
30
31
```Shell

32
python train.py --root-dir [root_dir] \
Pingchuan Ma's avatar
Pingchuan Ma committed
33
34
35
36
37
38
39
40
41
42
43
                --sp-model-path ./spm_unigram_1023.model
                --exp-dir ./exp \
                --num-nodes 8 \
                --gpus 8 \
                --md [modality] \
                --mode [mode]
```

### Training AV-ASR model

```Shell
44
python train.py --root-dir [root-dir] \
Pingchuan Ma's avatar
Pingchuan Ma committed
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
                --sp-model-path ./spm_unigram_1023.model
                --exp-dir ./exp \
                --num-nodes 8 \
                --gpus 8 \
                --md av \
                --mode [mode]
```

### Evaluating models

```Shell
python eval.py --dataset-path [dataset_path] \
               --sp-model-path ./spm_unigram_1023.model
               --md [modality] \
               --mode [mode] \
               --checkpoint-path [checkpoint_path]
```

63
The table below contains WER for AV-ASR models [offline evaluation].
Pingchuan Ma's avatar
Pingchuan Ma committed
64
65
66
67

|    Model    |    WER [%]   |   Params (M)   |
|:-----------:|:------------:|:--------------:|
| Non-streaming models       |                |
68
|    AV-ASR   |      4.0     |       50       |
Pingchuan Ma's avatar
Pingchuan Ma committed
69
| Streaming models           |                |
70
|    AV-ASR   |      4.3     |       40       |