# Conformer-Transducer with auxiliary task (CTC weight = 0.5) ## Environments - Same as RNN-Transducer (see below) ## Config files - preprocess config: `conf/specaug.yaml` - train config: `conf/tuning/transducer/train_conformer-rnn_transducer_aux_ngpu4.yaml` - lm config: `-` (LM was not used) - decode config: `conf/tuning/transducer/decode_default.yaml` - ngpu: `4` ## Results (CER) |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| |---|---|---|---|---|---|---|---|---| |decode_dev_decode_default|14326|205341|95.8|4.0|0.2|0.1|4.3|33.6| |decode_test_decode_default|7176|104765|95.3|4.4|0.2|0.1|4.8|36.3| # Conformer-Transducer ## Environments - Same as RNN-Transducer (see below) ## Config files - preprocess config: `conf/specaug.yaml` - train config: `conf/tuning/transducer/train_conformer-rnn_transducer.yaml` - lm config: `-` (LM was not used) - decode config: `conf/tuning/transducer/decode_default.yaml` ## Results (CER) |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| |---|---|---|---|---|---|---|---|---| |decode_dev_decode_default|14326|205341|95.6|4.2|0.2|0.1|4.5|34.0| |decode_test_decode_default|7176|104765|95.0|4.7|0.3|0.1|5.0|37.1| # RNN-Transducer with auxiliary task (CTC weight = 0.1) ## Environments - Same as RNN-Transducer (see below) ## Config files - preprocess config: `conf/specaug.yaml` - train config: `conf/tuning/transducer/train_transducer_aux.yaml` - lm config: `-` (LM was not used) - decode config: `conf/tuning/transducer/decode_default.yaml` ## Results (CER) |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| |---|---|---|---|---|---|---|---|---| |decode_dev_decode_default|14326|205341|93.9|5.8|0.3|0.1|6.3|41.9| |decode_test_decode_default|7176|104765|93.2|6.5|0.4|0.1|6.9|44.5| # RNN-Transducer ## Environments - date: `Thu May 20 05:29:03 UTC 2021` - python version: `3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]` - espnet version: `espnet 0.9.8` - chainer version: `chainer 6.0.0` - pytorch version: `pytorch 1.6.0` - Git hash: `95b3008cdcc2247e781a048bc999243dc7f45fe7` - Commit date: `Sat Mar 6 00:48:29 2021 +0000` ## Config files - preprocess config: `conf/specaug.yaml` - train config: `conf/tuning/transducer/train_transducer.yaml` - lm config: `-` (LM was not used) - decode config: `conf/tuning/transducer/decode_default.yaml` ## Results (CER) |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err| |---|---|---|---|---|---|---|---|---| |decode_dev_decode_default|14326|205341|93.8|5.9|0.3|0.1|6.3|42.0| |decode_test_decode_default|7176|104765|92.9|6.7|0.3|0.1|7.2|45.9| # Conformer (kernel size = 15) + SpecAugment + LM weight = 0.0 result - training config file: `conf/tuning/train_pytorch_conformer_kernel15.yaml` - preprocess config file: `conf/specaug.yaml` - decoding config file: `conf/decode.yaml`, set `lm-weight = 0.0` - model link: https://drive.google.com/file/d/1pOhwj6JFqVyt5quW7BKWfJ3vfPFRoxpQ/view?usp=sharing ``` exp/train_sp_pytorch_train_pytorch_conformer_kernel15_specaug/decode_dev_decode_lm0.0/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 14326 205341 | 95.4 4.5 0.1 0.1 4.6 36.0 | exp/train_sp_pytorch_train_pytorch_conformer_kernel15_specaug/decode_test_decode_lm0.0/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 7176 104765 | 95.0 4.9 0.1 0.1 5.1 38.6 | ``` # Conformer (kernel size = 31) + SpecAugment + LM weight = 0.0 result - training config file: `conf/tuning/train_pytorch_conformer_kernel31.yaml` - preprocess config file: `conf/specaug.yaml` - decoding config file: `conf/decode.yaml`, set `lm-weight = 0.0` ``` exp/train_sp_pytorch_train_pytorch_conformer_kernel31_specaug/decode_dev_decode_lm0.0/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 14326 205341 | 95.4 4.5 0.1 0.1 4.7 36.2 | exp/train_sp_pytorch_train_pytorch_conformer_kernel31_specaug/decode_test_decode_lm0.0/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 7176 104765 | 94.9 5.0 0.1 0.1 5.2 39.1 | ``` # Conformer (kernel size = 31) result - training config file: `conf/tuning/train_pytorch_conformer_kernel31.yaml` - decoding config file: `conf/decode.yaml` ``` exp/train_sp_pytorch_train_pytorch_conformer_kernel31/decode_dev_decode/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 14326 205341 | 94.9 5.0 0.1 0.1 5.2 38.3 | exp/train_sp_pytorch_train_pytorch_conformer_kernel31/decode_test_decode/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 7176 104765 | 94.2 5.4 0.2 0.1 5.8 41.0 | ``` # Transformer result (default transformer with initial learning rate = 1.0 and epochs = 50) - Environments (obtained by `$ get_sys_info.sh`) - date: `Mon Jun 10 12:34:41 EDT 2019` - system information: `Linux b14 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux` - python version: `Python 3.7.3` - espnet version: `espnet 0.3.1` - chainer version: `chainer 6.0.0` - pytorch version: `pytorch 1.0.1.post2` - Git hash: `82e9b7eb7ccae61e11af28981734ea1c2b315a98` - Model files (archived to model.v1.tar.gz by `$ pack_model.sh`) - model link: https://drive.google.com/open?id=1BIQBpLRRy3XSMT5IRxnLcgLMirGzu8dg - training config file: `conf/train.yaml` - decoding config file: `conf/decode.yaml` - cmvn file: `data/train_sp/cmvn.ark` - e2e file: `exp/train_sp_pytorch_train_pytorch_transformer_lr1.0/results/model.last10.avg.best` - e2e JSON file: `exp/train_sp_pytorch_train_pytorch_transformer_lr1.0/results/model.json` - lm file: `exp/train_rnnlm_pytorch_lm/rnnlm.model.best` - lm JSON file: `exp/train_rnnlm_pytorch_lm/model.json` - Results (paste them by yourself or obtained by `$ pack_model.sh --results `) ``` exp/train_sp_pytorch_train_pytorch_transformer_lr1.0/decode_dev_decode_pytorch_transformer_lm/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 14326 205341 | 94.1 5.7 0.2 0.1 6.0 42.0 | exp/train_sp_pytorch_train_pytorch_transformer_lr1.0/decode_test_decode_pytorch_transformer_lm/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 7176 104765 | 93.4 6.4 0.2 0.1 6.7 45.1 | ``` # First result (no tuning, but already very good. cf. Kaldi chain best 7.43% and nnet3 8.64% while ESPnet 8.0%) ``` exp/train_sp_pytorch_no_patience/decode_dev_beam20_emodel.acc.best_p0.0_len0.0-0.0_ctcw0.6_rnnlm0.3_2layer_unit650_sgd_bs64/result.txt: | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 14326 205341 | 93.3 6.5 0.2 0.1 6.8 45.2 | exp/train_sp_pytorch_no_patience/decode_test_beam20_emodel.acc.best_p0.0_len0.0-0.0_ctcw0.6_rnnlm0.3_2layer_unit650_sgd_bs64/result.txt: | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 7176 104765 | 92.2 7.6 0.2 0.2 8.0 50.2 | ``` # Ngram related - decoding with ngram and RNNLM ``` exp/train_sp_pytorch_train_pytorch_transformer_lr1.0/decode_dev_decode_pytorch_transformer_lm0.7_4gramfull0.3/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 14326 205341 | 94.1 5.7 0.2 0.1 6.0 41.7 | exp/train_sp_pytorch_train_pytorch_transformer_lr1.0/decode_test_decode_pytorch_transformer_lm0.7_4gramfull0.3/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 7176 104765 | 93.5 6.3 0.2 0.1 6.6 44.6 | ``` ``` exp/train_sp_pytorch_train_pytorch_transformer_lr1.0/decode_dev_decode_pytorch_transformer_lm0.7_4grampart0.3/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 14326 205341 | 94.1 5.7 0.2 0.1 6.0 41.7 | exp/train_sp_pytorch_train_pytorch_transformer_lr1.0/decode_test_decode_pytorch_transformer_lm0.7_4grampart0.3/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 7176 104765 | 93.5 6.3 0.2 0.1 6.6 44.6 | ``` - only e2e model ``` exp/train_sp_pytorch_train_pytorch_transformer_lr1.0/decode_dev_decode_pytorch_transformer/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 14326 205341 | 93.6 6.2 0.2 0.1 6.5 45.6 | exp/train_sp_pytorch_train_pytorch_transformer_lr1.0/decode_test_decode_pytorch_transformer/result.txt | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | | Sum/Avg | 7176 104765 | 92.7 7.1 0.2 0.1 7.4 49.8 | ```