Add cuctc decoder (#3096)
Summary: This PR implements a CUDA based ctc prefix beam search decoder. Attach serveral benchmark results using V100 below: |decoder type| model |datasets | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size | |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------| | cuctc | conformer nemo |dev clean |7.68s | 8 | 32 | bpe | 4 | 1000| | cuctc | conformer nemo |dev clean (sort by length) |1.6s | 8 | 32 | bpe | 4 | 1000| | cuctc | wav2vec2.0 torchaudio |dev clean |22s | 10 | 1 | char | 2 | 29| | cuctc | conformer espnet |aishell1 test | 5s | 10 | 24 | char | 4 | 4233| Note: 1. The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations. 2. WER is the same as CPU implementations. However, it can't decode with LM now. Resolves: https://github.com/pytorch/audio/issues/2957. Pull Request resolved: https://github.com/pytorch/audio/pull/3096 Reviewed By: nateanl Differential Revision: D44709397 Pulled By: mthrok fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
Showing
Please register or sign in to comment