Commit 10cce198 authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Update prototype documentations (#2108)

Summary:
### Change list

* Split the documentation of prototypes
* Add a new API reference section dedicated for prototypes.
* Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen )
  * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder
* Hide the signature of RNNT constructor. (cc hwangjeff )
  * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT
* Tweak CTC tutorial
  * Replace hyperlinks to API reference with backlinks
  * Add `progress=False` to download

### Follow-up

RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous.
I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough.

### Before

https://pytorch.org/audio/main/prototype.html

<img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png">

### After

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html

<img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png">

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html

<img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png">

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html

<img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2108

Reviewed By: hwangjeff, carolineechen, nateanl

Differential Revision: D33340816

Pulled By: mthrok

fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187
parent 72a98a86
...@@ -48,7 +48,17 @@ API References ...@@ -48,7 +48,17 @@ API References
compliance.kaldi compliance.kaldi
kaldi_io kaldi_io
utils utils
Prototype API References
------------------------
.. toctree::
:maxdepth: 1
:caption: Prototype API Reference
prototype prototype
prototype.rnnt
prototype.ctc_decoder
Getting Started Getting Started
--------------- ---------------
......
torchaudio.prototype.ctc_decoder
================================
.. currentmodule:: torchaudio.prototype.ctc_decoder
Decoder Class
-------------
KenLMLexiconDecoder
~~~~~~~~~~~~~~~~~~~
.. autoclass:: KenLMLexiconDecoder
.. automethod:: __call__
.. automethod:: idxs_to_tokens
Factory Function
----------------
kenlm_lexicon_decoder
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: kenlm_lexicon_decoder
torchaudio.prototype.rnnt
=========================
.. py:module:: torchaudio.prototype
.. currentmodule:: torchaudio.prototype
Model Classes
-------------
Conformer
~~~~~~~~~
.. autoclass:: Conformer
.. automethod:: forward
Emformer
~~~~~~~~
.. autoclass:: Emformer
.. automethod:: forward
.. automethod:: infer
RNNT
~~~~
.. autoclass:: RNNT
.. automethod:: forward
.. automethod:: transcribe_streaming
.. automethod:: transcribe
.. automethod:: predict
.. automethod:: join
Model Factory Functions
-----------------------
emformer_rnnt_base
~~~~~~~~~~~~~~~~~~
.. autofunction:: emformer_rnnt_base
emformer_rnnt_model
~~~~~~~~~~~~~~~~~~~
.. autofunction:: emformer_rnnt_model
Decoder Classes
---------------
RNNTBeamSearch
~~~~~~~~~~~~
.. autoclass:: RNNTBeamSearch
.. automethod:: forward
.. automethod:: infer
Hypothesis
~~~~~~~~~~
.. autoclass:: Hypothesis
Pipeline Primitives (Pre-trained Models)
----------------------------------------
RNNTBundle
~~~~~~~~~~
.. autoclass:: RNNTBundle
:members: sample_rate, n_fft, n_mels, hop_length, segment_length, right_context_length
.. automethod:: get_decoder
.. automethod:: get_feature_extractor
.. automethod:: get_streaming_feature_extractor
.. automethod:: get_token_processor
.. autoclass:: torchaudio.prototype::RNNTBundle.FeatureExtractor
:special-members: __call__
.. autoclass:: torchaudio.prototype::RNNTBundle.TokenProcessor
:special-members: __call__
EMFORMER_RNNT_BASE_LIBRISPEECH
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. container:: py attribute
.. autodata:: EMFORMER_RNNT_BASE_LIBRISPEECH
:no-value:
References
----------
.. footbibliography::
.. role:: hidden
:class: hidden-section
torchaudio.prototype torchaudio.prototype
==================== ====================
.. py:module:: torchaudio.prototype
.. currentmodule:: torchaudio.prototype
``torchaudio.prototype`` provides prototype features; ``torchaudio.prototype`` provides prototype features;
see `here <https://pytorch.org/audio>`_ for more information on prototype features. they are at an early stage for feedback and testing.
The module is available only within nightly builds and must be imported Their interfaces might be changed without prior notice.
explicitly, e.g. ``import torchaudio.prototype``.
Conformer
~~~~~~~~~
.. autoclass:: Conformer
.. automethod:: forward
Emformer
~~~~~~~~
.. autoclass:: Emformer
.. automethod:: forward
.. automethod:: infer
RNNT
~~~~
.. autoclass:: RNNT
.. automethod:: forward
.. automethod:: transcribe_streaming
.. automethod:: transcribe
.. automethod:: predict
.. automethod:: join
emformer_rnnt_base
~~~~~~~~~~~~~~~~~~
.. autofunction:: emformer_rnnt_base
emformer_rnnt_model
~~~~~~~~~~~~~~~~~~~
.. autofunction:: emformer_rnnt_model
RNNTBeamSearch
~~~~~~~~~~~~~~
.. autoclass:: RNNTBeamSearch
.. automethod:: forward
.. automethod:: infer
Hypothesis
~~~~~~~~~~
.. autoclass:: Hypothesis
RNNTBundle
~~~~~~~~~~
.. autoclass:: RNNTBundle
:members: sample_rate, n_fft, n_mels, hop_length, segment_length, right_context_length
.. automethod:: get_decoder
.. automethod:: get_feature_extractor
.. automethod:: get_streaming_feature_extractor
.. automethod:: get_token_processor
.. autoclass:: torchaudio.prototype::RNNTBundle.FeatureExtractor
:special-members: __call__
.. autoclass:: torchaudio.prototype::RNNTBundle.TokenProcessor
:special-members: __call__
EMFORMER_RNNT_BASE_LIBRISPEECH
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autodata:: EMFORMER_RNNT_BASE_LIBRISPEECH
:no-value:
KenLMLexiconDecoder
~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torchaudio.prototype.ctc_decoder
.. autoclass:: KenLMLexiconDecoder
.. automethod:: __call__
.. automethod:: idxs_to_tokens
kenlm_lexicon_decoder
~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: torchaudio.prototype.ctc_decoder Most modules of prototypes are excluded from release.
Please refer to `here <https://pytorch.org/audio>`_ for
more information on prototype features.
.. autoclass:: kenlm_lexicon_decoder The modules under ``torchaudio.prototype`` must be
imported explicitly, e.g.
.. code-block:: python
References import torchaudio.prototype.rnnt
~~~~~~~~~~
.. footbibliography:: .. toctree::
prototype.rnnt
prototype.ctc_decoder
...@@ -50,7 +50,7 @@ import torchaudio ...@@ -50,7 +50,7 @@ import torchaudio
# We use the pretrained `Wav2Vec 2.0 <https://arxiv.org/abs/2006.11477>`__ # We use the pretrained `Wav2Vec 2.0 <https://arxiv.org/abs/2006.11477>`__
# Base model that is finetuned on 10 min of the `LibriSpeech # Base model that is finetuned on 10 min of the `LibriSpeech
# dataset <http://www.openslr.org/12>`__, which can be loaded in using # dataset <http://www.openslr.org/12>`__, which can be loaded in using
# ``torchaudio.pipelines``. For more detail on running Wav2Vec 2.0 speech # py:func:`torchaudio.pipelines`. For more detail on running Wav2Vec 2.0 speech
# recognition pipelines in torchaudio, please refer to `this # recognition pipelines in torchaudio, please refer to `this
# tutorial <https://pytorch.org/audio/main/tutorials/speech_recognition_pipeline_tutorial.html>`__. # tutorial <https://pytorch.org/audio/main/tutorials/speech_recognition_pipeline_tutorial.html>`__.
# #
...@@ -161,12 +161,11 @@ torch.hub.download_url_to_file(kenlm_url, kenlm_file) ...@@ -161,12 +161,11 @@ torch.hub.download_url_to_file(kenlm_url, kenlm_file)
# Construct Beam Search Decoder # Construct Beam Search Decoder
# ----------------------------- # -----------------------------
# #
# The decoder can be constructed using the ``kenlm_lexicon_decoder`` # The decoder can be constructed using the
# factory function from ``torchaudio.prototype.ctc_decoder``. In addition # :py:func:`torchaudio.prototype.ctc_decoder.kenlm_lexicon_decoder`
# to the previously mentioned components, it also takes in various beam # factory function.
# search decoding parameters and token/word parameters. The full list of # In addition to the previously mentioned components, it also takes in
# parameters can be found # various beam search decoding parameters and token/word parameters.
# `here <https://pytorch.org/audio/main/prototype.html#kenlm-lexicon-decoder>`__.
# #
from torchaudio.prototype.ctc_decoder import kenlm_lexicon_decoder from torchaudio.prototype.ctc_decoder import kenlm_lexicon_decoder
...@@ -190,7 +189,7 @@ beam_search_decoder = kenlm_lexicon_decoder( ...@@ -190,7 +189,7 @@ beam_search_decoder = kenlm_lexicon_decoder(
# -------------- # --------------
# #
# For comparison against the beam search decoder, we also construct a # For comparison against the beam search decoder, we also construct a
# basic greedy decoder.\ **bold text** # basic greedy decoder.
# #
......
...@@ -24,6 +24,14 @@ Hypothesis = namedtuple("Hypothesis", ["tokens", "words", "score"]) ...@@ -24,6 +24,14 @@ Hypothesis = namedtuple("Hypothesis", ["tokens", "words", "score"])
class KenLMLexiconDecoder: class KenLMLexiconDecoder:
"""torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder()
Note:
To build the decoder, please use factory function
:py:func:`kenlm_lexicon_decoder`.
"""
def __init__( def __init__(
self, self,
nbest: int, nbest: int,
......
...@@ -426,7 +426,9 @@ class _Joiner(torch.nn.Module): ...@@ -426,7 +426,9 @@ class _Joiner(torch.nn.Module):
class RNNT(torch.nn.Module): class RNNT(torch.nn.Module):
r"""Recurrent neural network transducer (RNN-T) model. r"""torchaudio.prototype.rnnt.RNNT()
Recurrent neural network transducer (RNN-T) model.
Note: Note:
To build the model, please use one of the factory functions. To build the model, please use one of the factory functions.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment