Commit 6854eedf authored by Caroline Chen's avatar Caroline Chen Committed by Facebook GitHub Bot
Browse files

Update CTC decoder docs (#2136)

Summary:
after addition of tutorial and librispeech example with [WER results](https://github.com/pytorch/audio/pull/2130/files#diff-5f82be20f11a10a4cb411007df965b84eaab98fcc7ed49b42be1dd9916203193R20), we can remove README from decoder csrc directory and move the rest of the information to the corresponding documentation/main README

Pull Request resolved: https://github.com/pytorch/audio/pull/2136

Reviewed By: mthrok

Differential Revision: D33419449

Pulled By: carolineechen

fbshipit-source-id: 6cb29280f639af46834bec935b45f3c5a8ee350f
parent 5c4c61b2
...@@ -60,7 +60,7 @@ Please refer to https://pytorch.org/get-started/locally/ for the details. ...@@ -60,7 +60,7 @@ Please refer to https://pytorch.org/get-started/locally/ for the details.
### From Source ### From Source
On non-Windows platforms, the build process builds libsox and codecs that torchaudio need to link to. It will fetch and build libmad, lame, flac, vorbis, opus, and libsox before building extension. This process requires `cmake` and `pkg-config`. libsox-based features can be disabled with `BUILD_SOX=0`. On non-Windows platforms, the build process builds libsox and codecs that torchaudio need to link to. It will fetch and build libmad, lame, flac, vorbis, opus, and libsox before building extension. This process requires `cmake` and `pkg-config`. libsox-based features can be disabled with `BUILD_SOX=0`.
The build process also builds the RNN transducer loss. This functionality can be disabled by setting the environment variable `BUILD_RNNT=0`. The build process also builds the RNN transducer loss and CTC beam search decoder. These functionalities can be disabled by setting the environment variable `BUILD_RNNT=0` and `BUILD_CTC_DECODER=0`, respectively.
```bash ```bash
# Linux # Linux
......
# Flashlight Decoder Binding
CTC Decoder with KenLM and lexicon support based on [flashlight](https://github.com/flashlight/flashlight) decoder implementation
and fairseq [KenLMDecoder](https://github.com/pytorch/fairseq/blob/fcca32258c8e8bcc9f9890bf4714fa2f96b6b3e1/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53)
Python wrapper
## Setup
### Build torchaudio with decoder support
```
BUILD_CTC_DECODER=1 python setup.py develop
```
## Usage
```py
from torchaudio.prototype.ctc_decoder import kenlm_lexicon_decoder
decoder = kenlm_lexicon_decoder(args...)
results = decoder(emissions) # dim (B, nbest) of dictionary of "tokens", "score", "words" keys
best_transcripts = [" ".join(results[i][0].words).strip() for i in range(B)]
```
## Required Files
- tokens: tokens for which the acoustic model generates probabilities for
- lexicon: mapping between words and its corresponding spelling
- language model: n-gram KenLM model
## Experiment Results
LibriSpeech dev-other and test-other results using pretrained [Wav2Vec2](https://arxiv.org/pdf/2006.11477.pdf) models of
BASE configuration.
| Model | Decoder | dev-other | test-other | beam search params |
| ----------- | ---------- | ----------- | ---------- |-------------------------------------------- |
| BASE_10M | Greedy | 51.6 | 51 | |
| | 4-gram LM | 15.95 | 15.9 | LM weight=3.23, word score=-0.26, beam=1500 |
| BASE_100H | Greedy | 13.6 | 13.3 | |
| | 4-gram LM | 8.5 | 8.8 | LM weight=2.15, word score=-0.52, beam=50 |
| BASE_960H | Greedy | 8.9 | 8.4 | |
| | 4-gram LM | 6.3 | 6.4 | LM weight=1.74, word score=0.52, beam=50 |
...@@ -181,15 +181,17 @@ def kenlm_lexicon_decoder( ...@@ -181,15 +181,17 @@ def kenlm_lexicon_decoder(
Builds Ken LM CTC Lexicon Decoder with given parameters Builds Ken LM CTC Lexicon Decoder with given parameters
Args: Args:
lexicon (str): lexicon file containing the possible words lexicon (str): lexicon file containing the possible words and corresponding spellings.
tokens (str or List[str]): file or list containing valid tokens Each line consists of a word and its space separated spelling
tokens (str or List[str]): file or list containing valid tokens. If using a file, the expected
format is for tokens mapping to the same index to be on the same line
kenlm (str): file containing languge model kenlm (str): file containing languge model
nbest (int, optional): number of best decodings to return (Default: 1) nbest (int, optional): number of best decodings to return (Default: 1)
beam_size (int, optional): max number of hypos to hold after each decode step (Default: 50) beam_size (int, optional): max number of hypos to hold after each decode step (Default: 50)
beam_size_token (int, optional): max number of tokens to consider at each decode step. beam_size_token (int, optional): max number of tokens to consider at each decode step.
If None, it is set to the total number of tokens (Default: None) If None, it is set to the total number of tokens (Default: None)
beam_threshold (float, optional): threshold for pruning hypothesis (Default: 50) beam_threshold (float, optional): threshold for pruning hypothesis (Default: 50)
lm_weight (float, optional): weight of lm (Default: 2) lm_weight (float, optional): weight of language model (Default: 2)
word_score (float, optional): word insertion score (Default: 0) word_score (float, optional): word insertion score (Default: 0)
unk_score (float, optional): unknown word insertion score (Default: -inf) unk_score (float, optional): unknown word insertion score (Default: -inf)
sil_score (float, optional): silence insertion score (Default: 0) sil_score (float, optional): silence insertion score (Default: 0)
...@@ -200,6 +202,14 @@ def kenlm_lexicon_decoder( ...@@ -200,6 +202,14 @@ def kenlm_lexicon_decoder(
Returns: Returns:
KenLMLexiconDecoder: decoder KenLMLexiconDecoder: decoder
Example
>>> decoder = kenlm_lexicon_decoder(
>>> lexicon="lexicon.txt",
>>> tokens="tokens.txt",
>>> kenlm="kenlm.bin",
>>> )
>>> results = decoder(emissions) # List of shape (B, nbest) of Hypotheses
""" """
lexicon = _load_words(lexicon) lexicon = _load_words(lexicon)
word_dict = _create_word_dict(lexicon) word_dict = _create_word_dict(lexicon)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment