README.md

# Performance Record

## Conformer Result Bidecoder (large)

* Encoder FLOPs(30s): 96,238,430,720, params: 85,709,704
* Feature info: using fbank feature, cmvn, dither, online speed perturb
* Training info: train_conformer_bidecoder_large.yaml, kernel size 31, lr 0.002, batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 1.0
* Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
* Git hash: 65270043fc8c2476d1ab95e7c39f730017a670e0
* LM-tgmed: [3-gram.pruned.1e-7.arpa.gz](http://www.openslr.org/resources/11/3-gram.pruned.1e-7.arpa.gz)
* LM-tglarge: [3-gram.arpa.gz](http://www.openslr.org/resources/11/3-gram.arpa.gz)
* LM-fglarge: [4-gram.arpa.gz](http://www.openslr.org/resources/11/4-gram.arpa.gz)

| decoding mode                    | test clean | test other |
|----------------------------------|------------|------------|
| ctc prefix beam search           | 2.96       | 7.14       |
| attention rescoring              | 2.66       | 6.53       |
| LM-tgmed + attention rescoring   | 2.78       | 6.32       |
| LM-tglarge + attention rescoring | 2.68       | 6.10       |
| LM-fglarge + attention rescoring | 2.65       | 5.98       |

## SqueezeFormer Result (U2++, FFN:2048)

* Encoder info:
    * SM12, reduce_idx 5, recover_idx 11, conv1d, batch_norm, syncbn
    * encoder_dim 512, output_size 512, head 8, ffn_dim 512*4=2048
    * Encoder FLOPs(30s): 82,283,704,832, params: 85,984,648
* Feature info:
    * using fbank feature, cmvn, dither, online speed perturb, spec_aug
* Training info:
    * train_squeezeformer_bidecoder_large.yaml, kernel size 31
    * batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 1.0
    * adamw, lr 8e-4, NoamHold, warmup 0.2, hold 0.3, lr_decay 1.0
* Decoding info:
    * ctc_weight 0.3, reverse weight 0.5, average_num 30

| decoding mode                    | dev clean | dev other | test clean | test other |
|----------------------------------|-----------|-----------|------------|------------|
| ctc greedy search                | 2.55      | 6.62      | 2.73       | 6.59       |
| ctc prefix beam search           | 2.53      | 6.60      | 2.72       | 6.52       |
| attention decoder                | 2.93      | 6.56      | 3.31       | 6.47       |
| attention rescoring              | 2.19      | 6.06      | 2.45       | 5.85       |

## Conformer Result

* Encoder FLOPs(30s): 34,085,088,512, params: 34,761,608
* Feature info: using fbank feature, cmvn, dither, online speed perturb
* Training info: train_conformer.yaml, kernel size 31, lr 0.004, batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
* Decoding info: ctc_weight 0.5, average_num 30
* Git hash: 90d9a559840e765e82119ab72a11a1f7c1a01b78
* LM-fglarge: [4-gram.arpa.gz](http://www.openslr.org/resources/11/4-gram.arpa.gz)

| decoding mode                    | test clean | test other |
|----------------------------------|------------|------------|
| ctc greedy search                | 3.51       | 9.57       |
| ctc prefix beam search           | 3.51       | 9.56       |
| attention decoder                | 3.05       | 8.36       |
| attention rescoring              | 3.18       | 8.72       |
| attention rescoring (beam 50)    | 3.12       | 8.55       |
| LM-fglarge + attention rescoring | 3.09       | 7.40       |

## Conformer Result (12 layers, FFN:2048)
* Encoder FLOPs(30s): 34,085,088,512, params: 34,761,608
* Feature info: using fbank feature, cmvn, dither, online speed perturb
* Training info: train_squeezeformer.yaml, kernel size 31,
* batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
* AdamW, lr 1e-3, NoamHold, warmup 0.2, hold 0.3, lr_decay 1.0
* Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30

| decoding mode                    | dev clean | dev other | test clean | test other |
|----------------------------------|-----------|-----------|------------|------------|
| ctc greedy search                | 3.49      | 9.59      | 3.66       | 9.59       |
| ctc prefix beam search           | 3.49      | 9.61      | 3.66       | 9.55       |
| attention decoder                | 3.52      | 9.04      | 3.85       | 8.97       |
| attention rescoring              | 3.10      | 8.91      | 3.29       | 8.81       |

## SqueezeFormer Result (SM12, FFN:1024)
* Encoder info:
    * SM12, reduce_idx 5, recover_idx 11, conv2d, w/o syncbn
    * encoder_dim 256, output_size 256, head 4, ffn_dim 256*4=1024
    * Encoder FLOPs(30s): 21,158,877,440, params: 22,219,912
* Feature info:
    * using fbank feature, cmvn, dither, online speed perturb
* Training info:
    * train_squeezeformer.yaml, kernel size 31,
    * batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
    * adamw, lr=1e-3, noamhold, warmup=0.2, hold=0.3, lr_decay=1.0
* Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30

| decoding mode                    | dev clean | dev other | test clean | test other |
|----------------------------------|-----------|-----------|------------|------------|
| ctc greedy search                | 3.49      | 9.24      | 3.51       | 9.28       |
| ctc prefix beam search           | 3.44      | 9.23      | 3.51       | 9.25       |
| attention decoder                | 3.59      | 8.74      | 3.75       | 8.70       |
| attention rescoring              | 2.97      | 8.48      | 3.07       | 8.44       |

## SqueezeFormer Result (SM12, FFN:2048)
* Encoder info:
    * SM12, reduce_idx 5, recover_idx 11, conv2d, w/o syncbn
    * encoder_dim 256, output_size 256, head 4, ffn_dim 256*8=2048
    * encoder FLOPs(30s): 28,230,473,984, params: 34,827,400
* Feature info: using fbank feature, cmvn, dither, online speed perturb
* Training info:
    * train_squeezeformer.yaml, kernel size 31
    * batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
    * adamw, lr 1e-3, noamhold, warmup 0.2, hold 0.3, lr_decay 1.0
* Decoding info:
    * ctc_weight 0.3, reverse weight 0.5, average_num 30

| decoding mode                    | dev clean | dev other | test clean | test other |
|----------------------------------|-----------|-----------|------------|------------|
| ctc greedy search                | 3.34      | 9.01      | 3.47       | 8.85       |
| ctc prefix beam search           | 3.33      | 9.02      | 3.46       | 8.81       |
| attention decoder                | 3.64      | 8.62      | 3.91       | 8.33       |
| attention rescoring              | 2.89      | 8.34      | 3.10       | 8.03       |

## SqueezeFormer Result (SM12, FFN:1312)
* Encoder info:
    * SM12, reduce_idx 5, recover_idx 11, conv1d, w/o syncbn
    * encoder_dim 328, output_size 256, head 4, ffn_dim 328*4=1312
    * encoder FLOPs(30s): 34,103,960,008, params: 35,678,352
* Feature info:
    * using fbank feature, cmvn, dither, online speed perturb
* Training info:
    * train_squeezeformer.yaml, kernel size 31,
    * batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 1.0
    * adamw, lr 1e-3, noamhold, warmup 0.2, hold 0.3, lr_decay 1.0
* Decoding info:
    * ctc_weight 0.3, reverse weight 0.5, average_num 30

| decoding mode                    | dev clean | dev other | test clean | test other |
|----------------------------------|-----------|-----------|------------|------------|
| ctc greedy search                | 3.20      | 8.46      | 3.30       | 8.58       |
| ctc prefix beam search           | 3.18      | 8.44      | 3.30       | 8.55       |
| attention decoder                | 3.38      | 8.31      | 3.89       | 8.32       |
| attention rescoring              | 2.81      | 7.86      | 2.96       | 7.91       |

## Conformer U2++ Result

* Feature info: using fbank feature, cmvn, no speed perturb, dither
* Training info: train_u2++_conformer.yaml lr 0.001, batch size 24, 8 gpu, acc_grad 1, 120 epochs, dither 1.0
* Decoding info: ctc_weight 0.3,  reverse weight 0.5, average_num 30
* Git hash: 65270043fc8c2476d1ab95e7c39f730017a670e0

test clean

| decoding mode                  | full | 16   |
|--------------------------------|------|------|
| ctc prefix beam search         | 3.76 | 4.54 |
| attention rescoring            | 3.32 | 3.80 |

test other

| decoding mode                  | full  | 16    |
|--------------------------------|-------|-------|
| ctc prefix beam search         | 9.50  | 11.52 |
| attention rescoring            | 8.67  | 10.38 |

## SqueezeFormer Result (U2++, FFN:2048)

* Encoder info:
    * SM12, reduce_idx 5, recover_idx 11, conv1d, layer_norm
    * do_rel_shift false, warp_for_time, syncbn
    * encoder_dim 256, output_size 256, head 4, ffn_dim 256*8=2048
    * Encoder FLOPs(30s): 28,255,337,984, params: 34,893,704
* Feature info:
    * using fbank feature, cmvn, dither, online speed perturb
* Training info:
    * train_squeezeformer.yaml, kernel size 31
    * batch size 12, 8 gpu, acc_grad 2, 120 epochs, dither 1.0
    * adamw, lr 8e-4, NoamHold, warmup 0.2, hold 0.3, lr_decay 1.0
* Decoding info:
    * ctc_weight 0.3, reverse weight 0.5, average_num 30

test clean

| decoding mode                  | full | 16   |
|--------------------------------|------|------|
| ctc prefix beam search         | 3.45 | 4.34 |
| attention rescoring            | 3.07 | 3.71 |

test other

| decoding mode                  | full  | 16    |
|--------------------------------|-------|-------|
| ctc prefix beam search         | 8.29  | 10.60 |
| attention rescoring            | 7.58  | 9.60  |

## Conformer U2 Result

* Feature info: using fbank feature, cmvn, speed perturb, dither
* Training info: train_unified_conformer.yaml lr 0.001, batch size 10, 8 gpu, acc_grad 1, 120 epochs, dither 1.0
* Decoding info: ctc_weight 0.5, average_num 30
* Git hash: 90d9a559840e765e82119ab72a11a1f7c1a01b78
* LM-tgmed: [3-gram.pruned.1e-7.arpa.gz](http://www.openslr.org/resources/11/3-gram.pruned.1e-7.arpa.gz)
* LM-tglarge: [3-gram.arpa.gz](http://www.openslr.org/resources/11/3-gram.arpa.gz)
* LM-fglarge: [4-gram.arpa.gz](http://www.openslr.org/resources/11/4-gram.arpa.gz)

test clean

| decoding mode                    | full | 16   |
|----------------------------------|------|------|
| ctc prefix beam search           | 4.26 | 5.00 |
| attention decoder                | 3.05 | 3.44 |
| attention rescoring              | 3.72 | 4.10 |
| attention rescoring (beam 50)    | 3.57 | 3.95 |
| LM-tgmed + attention rescoring   | 3.56 | 4.02 |
| LM-tglarge + attention rescoring | 3.40 | 3.82 |
| LM-fglarge + attention rescoring | 3.38 | 3.74 |

test other

| decoding mode                    | full  | 16    |
|----------------------------------|-------|-------|
| ctc prefix beam search           | 10.87 | 12.87 |
| attention decoder                | 9.07  | 10.44 |
| attention rescoring              | 9.74  | 11.61 |
| attention rescoring (beam 50)    | 9.34  | 11.13 |
| LM-tgmed + attention rescoring   | 8.78  | 10.26 |
| LM-tglarge + attention rescoring | 8.34  | 9.74  |
| LM-fglarge + attention rescoring | 8.17  | 9.44  |