README.md 5.43 KB
Newer Older
Sugon_ldc's avatar
Sugon_ldc committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# Performance Record

This is a Chinese speech recognition recipe that trains on all Chinese corpora including:

| Dataset    | Duration (Hours) |
|------------|------------------|
| Aidatatang | 140              |
| Aishell    | 151              |
| MagicData  | 712              |
| Primewords | 99               |
| ST-CMDS    | 110              |
| THCHS-30   | 26               |
| TAL-ASR    | 587              |
| AISHELL2   | 1000             |

## Unified Transformer Result

### Data info:

* Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, and THCHS-30.
* Feature info: using fbank feature, with cmvn, no speed perturb.
* Training info: lr 0.004, batch size 18, 3 machines, 3*8 = 24 GPUs, acc_grad 1, 220 epochs, dither 0.1
* Decoding info: ctc_weight 0.5, average_num 30
* Git hash: 013794572a55c7d0dbea23a66106ccf3e5d3b8d4

### WER

| Dataset    | chunk size | attention decoder | ctc greedy search | ctc prefix beam search | attention rescoring |
|------------|------------|-------------------|-------------------|------------------------|---------------------|
| Aidatatang | full       | 4.23              | 5.82              | 5.82                   | 4.71                |
|            | 16         | 4.59              | 6.99              | 6.99                   | 5.29                |
| Aishell    | full       | 4.69              | 5.80              | 5.80                   | 4.64                |
|            | 16         | 4.97              | 6.75              | 6.75                   | 5.37                |
| MagicData  | full       | 2.86              | 4.01              | 4.00                   | 3.07                |
|            | 16         | 3.10              | 5.02              | 5.02                   | 3.68                |
| THCHS-30   | full       | 16.68             | 15.46             | 15.46                  | 14.38               |
|            | 16         | 17.47             | 16.81             | 16.82                  | 15.63               |

## Unified Conformer Result

### Data info:

* Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, and THCHS-30.
* Feature info: using fbank feature, with cmvn, speed perturb.
* Training info: lr 0.001, batch size 8, 1 machines, 1*8 = 8 GPUs, acc_grad 12, 60 epochs
* Decoding info: ctc_weight 0.5, average_num 10
* Git hash: 5bdf436e671ef4c696d1b039f29cc33109e072fa

### WER

| Dataset    | chunk size | attention decoder | ctc greedy search | ctc prefix beam search | attention rescoring |
|------------|------------|-------------------|-------------------|------------------------|---------------------|
| Aidatatang | full       | 4.12              | 4.97              | 4.97                   | 4.22                |
|            | 16         | 4.45              | 5.73              | 5.73                   | 4.75                |
| Aishell    | full       | 4.49              | 5.07              | 5.05                   | 4.43                |
|            | 16         | 4.77              | 5.77              | 5.77                   | 4.85                |
| MagicData  | full       | 2.55              | 3.07              | 3.05                   | 2.59                |
|            | 16         | 2.81              | 3.88              | 3.86                   | 3.08                |
| THCHS-30   | full       | 13.55             | 13.75             | 13.76                  | 12.72               |
|            | 16         | 13.78             | 15.10             | 15.08                  | 13.90               |

## Unified Conformer Result

### Data info:

* Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, THCHS-30, TAL-ASR, and AISHELL2.
* Feature info: using fbank feature, dither=0, cmvn, speed perturb
* Training info: lr 0.001, batch size 22, 4 GPUs, acc_grad 4, 120 epochs, dither 0.1
* Decoding info: ctc_weight 0.5, average_num 10
* Git hash: 66f30c197d00c59fdeda3bc8ada801f867b73f78

### WER

| Dataset    | chunk size | attention decoder | ctc greedy search | ctc prefix beam search | attention rescoring |
|------------|------------|-------------------|-------------------|------------------------|---------------------|
| Aidatatang | full       | 3.22              | 4.00              | 4.01                   | 3.35                |
|            | 16         | 3.50              | 4.63              | 4.63                   | 3.79                |
| Aishell    | full       | 1.23              | 2.12              | 2.13                   | 1.42                |
|            | 16         | 1.33              | 2.72              | 2.72                   | 1.72                |
| MagicData  | full       | 2.38              | 3.07              | 3.05                   | 2.52                |
|            | 16         | 2.66              | 3.80              | 3.78                   | 2.94                |
| THCHS-30   | full       | 9.93              | 11.07             | 11.06                  | 10.16               |
|            | 16         | 10.28             | 11.85             | 11.85                  | 10.81               |
| AISHELL2   | full       | 5.25              | 5.81              | 5.79                   | 5.22                |
|            | 16         | 5.48              | 6.48              | 6.50                   | 5.61                |
| TAL-ASR    | full       | 9.54              | 10.35             | 10.28                  | 9.66                |
|            | 16         | 10.04             | 11.43             | 11.39                  | 10.55               |