• Yuekai Zhang's avatar
    Add cuctc decoder (#3096) · 0a1801ed
    Yuekai Zhang authored
    Summary:
    This PR implements a CUDA based ctc prefix beam search decoder.
    
    Attach serveral benchmark results using V100 below:
    |decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
    |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
    | cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
    | cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
    | cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
    | cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|
    
    Note:
    1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
    2. WER is the same as CPU implementations. However, it can't decode with LM now.
    
    Resolves: https://github.com/pytorch/audio/issues/2957.
    
    Pull Request resolved: https://github.com/pytorch/audio/pull/3096
    
    Reviewed By: nateanl
    
    Differential Revision: D44709397
    
    Pulled By: mthrok
    
    fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
    0a1801ed
models.decoder.rst 715 Bytes