Update CTC decoder docs (#2136)

Summary: after addition of tutorial and librispeech example with [WER results](https://github.com/pytorch/audio/pull/2130/files#diff-5f82be20f11a10a4cb411007df965b84eaab98fcc7ed49b42be1dd9916203193R20), we can remove README from decoder csrc directory and move the rest of the information to the corresponding documentation/main README Pull Request resolved: https://github.com/pytorch/audio/pull/2136 Reviewed By: mthrok Differential Revision: D33419449 Pulled By: carolineechen fbshipit-source-id: 6cb29280f639af46834bec935b45f3c5a8ee350f

Update CTC decoder docs (#2136)
Summary: after addition of tutorial and librispeech example with [WER results](https://github.com/pytorch/audio/pull/2130/files#diff-5f82be20f11a10a4cb411007df965b84eaab98fcc7ed49b42be1dd9916203193R20), we can remove README from decoder csrc directory and move the rest of the information to the corresponding documentation/main README Pull Request resolved: https://github.com/pytorch/audio/pull/2136 Reviewed By: mthrok Differential Revision: D33419449 Pulled By: carolineechen fbshipit-source-id: 6cb29280f639af46834bec935b45f3c5a8ee350f
6854eedf · Caroline Chen · Facebook GitHub Bot · 5c4c61b2 · 6854eedf · 5c4c61b2
Commit 6854eedf authored Jan 05, 2022 by Caroline Chen Committed by Facebook GitHub Bot Jan 05, 2022
Showing with 14 additions and 40 deletions

README.md README.md +1 -1

torchaudio/csrc/decoder/README.md torchaudio/csrc/decoder/README.md +0 -36

torchaudio/prototype/ctc_decoder/ctc_decoder.py torchaudio/prototype/ctc_decoder/ctc_decoder.py +13 -3

No files found.
--- a/README.md
+++ b/README.md
@@ -60,7 +60,7 @@ Please refer to https://pytorch.org/get-started/locally/ for the details.
 ### From Source
 On non-Windows platforms, the build process builds libsox and codecs that torchaudio need to link to. It will fetch and build libmad, lame, flac, vorbis, opus, and libsox before building extension. This process requires `cmake` and `pkg-config`. libsox-based features can be disabled with `BUILD_SOX=0`.
-The build process also builds the RNN transducer loss. This functionality can be disabled by setting the environment variable `BUILD_RNNT=0`.
+The build process also builds the RNN transducer loss and CTC beam search decoder. These functionalities can be disabled by setting the environment variable `BUILD_RNNT=0` and `BUILD_CTC_DECODER=0`, respectively.
 ```bash
 # Linux

--- a/torchaudio/csrc/decoder/README.md
+++ b/torchaudio/csrc/decoder/README.md
-# Flashlight Decoder Binding
-CTC Decoder with KenLM and lexicon support based on [flashlight](https://github.com/flashlight/flashlight) decoder implementation
-and fairseq [KenLMDecoder](https://github.com/pytorch/fairseq/blob/fcca32258c8e8bcc9f9890bf4714fa2f96b6b3e1/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53)
-Python wrapper
-## Setup
-### Build torchaudio with decoder support
-```
-BUILD_CTC_DECODER=1 python setup.py develop
-```
-## Usage
-```py
-from torchaudio.prototype.ctc_decoder import kenlm_lexicon_decoder
-decoder = kenlm_lexicon_decoder(args...)
-results = decoder(emissions) # dim (B, nbest) of dictionary of "tokens", "score", "words" keys
-best_transcripts = [" ".join(results[i][0].words).strip() for i in range(B)]
-```
-## Required Files
- tokens: tokens for which the acoustic model generates probabilities for
- lexicon: mapping between words and its corresponding spelling
- language model: n-gram KenLM model
-## Experiment Results
-LibriSpeech dev-other and test-other results using pretrained [Wav2Vec2](https://arxiv.org/pdf/2006.11477.pdf) models of
-BASE configuration.
-| Model       | Decoder    | dev-other   | test-other | beam search params                          |
-| ----------- | ---------- | ----------- | ---------- |-------------------------------------------- |
-| BASE_10M    | Greedy     | 51.6        | 51         |                                             |
-|             | 4-gram LM  | 15.95       | 15.9       | LM weight=3.23, word score=-0.26, beam=1500 |
-| BASE_100H   | Greedy     | 13.6        | 13.3       |                                             |
-|             | 4-gram LM  | 8.5         | 8.8        | LM weight=2.15, word score=-0.52, beam=50   |
-| BASE_960H   | Greedy     | 8.9         | 8.4        |                                             |
-|             | 4-gram LM  | 6.3         | 6.4        | LM weight=1.74, word score=0.52, beam=50    |
--- a/torchaudio/prototype/ctc_decoder/ctc_decoder.py
+++ b/torchaudio/prototype/ctc_decoder/ctc_decoder.py
@@ -181,15 +181,17 @@ def kenlm_lexicon_decoder(
    Builds Ken LM CTC Lexicon Decoder with given parameters
    Args:
-        lexicon (str): lexicon file containing the possible words
+        lexicon (str): lexicon file containing the possible words and corresponding spellings.
-        tokens (str or List[str]): file or list containing valid tokens
+            Each line consists of a word and its space separated spelling
+        tokens (str or List[str]): file or list containing valid tokens. If using a file, the expected
+            format is for tokens mapping to the same index to be on the same line
        kenlm (str): file containing languge model
        nbest (int, optional): number of best decodings to return (Default: 1)
        beam_size (int, optional): max number of hypos to hold after each decode step (Default: 50)
        beam_size_token (int, optional): max number of tokens to consider at each decode step.
            If None, it is set to the total number of tokens (Default: None)
        beam_threshold (float, optional): threshold for pruning hypothesis (Default: 50)
-        lm_weight (float, optional): weight of lm (Default: 2)
+        lm_weight (float, optional): weight of language model (Default: 2)
        word_score (float, optional): word insertion score (Default: 0)
        unk_score (float, optional): unknown word insertion score (Default: -inf)
        sil_score (float, optional): silence insertion score (Default: 0)
@@ -200,6 +202,14 @@ def kenlm_lexicon_decoder(
    Returns:
        KenLMLexiconDecoder: decoder
+    Example
+        >>> decoder = kenlm_lexicon_decoder(
+        >>>     lexicon="lexicon.txt",
+        >>>     tokens="tokens.txt",
+        >>>     kenlm="kenlm.bin",
+        >>> )
+        >>> results = decoder(emissions) # List of shape (B, nbest) of Hypotheses
    """
    lexicon = _load_words(lexicon)
    word_dict = _create_word_dict(lexicon)