• Caroline Chen's avatar
    Add note for lexicon free decoder output (#2603) · 33485b8c
    Caroline Chen authored
    Summary:
    ``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial.
    
    Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc
    
    Pull Request resolved: https://github.com/pytorch/audio/pull/2603
    
    Reviewed By: mthrok
    
    Differential Revision: D38459709
    
    Pulled By: carolineechen
    
    fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934
    33485b8c
_ctc_decoder.py 13.4 KB