README.md

# Speech Recognition Inference with CUDA CTC Beam Search Decoder

This is an example inference script for running decoding on the LibriSpeech dataset and [zipformer](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_ctc) models, using a CUDA-based CTC beam search decoder that supports parallel decoding through batch and vocabulary axises.

## Usage
Additional command line parameters and information can is available with the `--help` option.

Sample command

```
pip install sentencepiece

# download pretrained files
wget -nc https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-ctc-2022-12-01/resolve/main/data/lang_bpe_500/bpe.model
wget -nc https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-ctc-2022-12-01/resolve/main/exp/cpu_jit.pt

python inference.py \
    --librispeech_path ./librispeech/ \
    --split test-other \
    --model ./cpu_jit.pt \
    --bp-model ./bpe.model \
    --beam-size 10 \
    --blank-skip-threshold 0.95
```

## Results
The table below contains throughput and WER benchmark results on librispeech test_other set between cuda ctc decoder and flashlight cpu decoder.

(Note: batch_size=4, beam_size=10, nbest=10, vocab_size=500, no LM, Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz, V100 GPU)

| Decoder | Setting | WER (%) | N-Best Oracle WER (%) | Decoder Cost Time (seconds) |
|:-----------|-----------:|-----------:|-----------:|-----------:|
|CUDA decoder|blank_skip_threshold=0.95| 5.81 | 4.11 | 2.57 |
|CUDA decoder|blank_skip_threshold=1.0 (no frame-skip)| 5.81 | 4.09 | 6.24 |
|flashlight decoder|beam_size_token=10| 5.86 | 4.30 | 28.61 |
|flashlight decoder|beam_size_token=vocab_size| 5.86 | 4.30 | 791.80 |